Improving the Prediction Accuracy of Gene Structures in Eukaryotic DNA with Low C+G Contents
نویسندگان
چکیده
have developed a gene prediction program GeneKey. When trained with the widely used dataset lected by Kulp and Reese, GeneKey can achieve high prediction accuracy for genes with derate and high C+G contents. However, the prediction accuracy is much lower for CG-poor es. To tackle this problem, we construct a new LCG316 dataset composed of gene sequences h low C+G contents. For CG-poor genes, the prediction accuracy of GeneKey when trained with G316 dataset has been improved prominently. Further statistical analysis demonstrates that some cture features, such as splicing signals and codon usage, of CG-poor genes are quite different m that of CG-rich ones. The combination of the two datasets enables GeneKey to get high and anced prediction accuracy for both CG-rich and CG-poor genes. The results of this work imply t careful construction of training dataset is very important for improving the performance of ious prediction tasks. The GeneKey program is available at http://infosci.hust.edu.cn. ywords: DNA sequence, prediction of gene structure, prediction of protein coding region
منابع مشابه
Evaluation of First and Second Markov Chains Sensitivity and Specificity as Statistical Approach for Prediction of Sequences of Genes in Virus Double Strand DNA Genomes
Growing amount of information on biological sequences has made application of statistical approaches necessary for modeling and estimation of their functions. In this paper, sensitivity and specificity of the first and second Markov chains for prediction of genes was evaluated using the complete double stranded DNA virus. There were two approaches for prediction of each Markov Model parameter,...
متن کاملDesigning and construction of a DNA vaccine encoding tb10.4 gene of Mycobacterium tuberculosis
Background: Tuberculosis (TB) remains as a major cause of death around the world. Construction of a new vaccine against tuberculosis is an effective way to control it. Several vaccines against this disease have been developed. The aim of the present study was to cloning of tb10.4 gene in pcDNA3.1+ plasmid and evaluation of its expression in eukaryotic cells. ...
متن کاملIn silico screening of G-Quadruplex Structures in Wilms tumor 1 Gene Promoter
Introduction: X-ray diffraction studies have revealed that guanines in a DNA stands may be arranged in quartet and form a structure called G-quadruplexs. Bioinformatics studies suggested the formation of G-quadruplex structure in human crucial genes, including Wilms tumor 1 (WT1). The aim of this study was to in silico analysis of the guanine-rich sequence in the promoter region of the WT1 gene...
متن کاملComparing Different Marker Densities and Various Reference Populations Using Pedigree-Marker Best Linear Unbiased Prediction (BLUP) Model
In order to have successful application of genomic selection, reference population and marker density should be chosen properly. This study purpose was to investigate the accuracy of genomic estimated breeding values in terms of low (5K), intermediate (50K) and high (777K) densities in the simulated populations, when different scenarios were applied about the reference populations selecting. Af...
متن کاملEvaluating and Improving the Accuracy of Computational Gene-Finding on Mammalian DNA Sequences
This thesis presents work in one of the main research areas in Computational Biology: computational gene-finding in higher eukaryotic genomic DNA. Programs for identification of gene structures have been in existence for more than a decade, but today they are used more extensively than ever to analyze the enormous amount of sequence data coming from various genome sequencing projects. Consequen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006